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Abstract 

Mean field replica theory is employed to analyze the freezing transition of 
random heteropolymers comprised of an arbitrary number (g) of types of 
monomers. Our formalism assumes that interactions are short range and 
heterogeneity comes only from pairwise interactions, which are defined by 
an arbitrary q x q matrix. We show that, in general, there exists a freezing 
transition from a random globule, in which the thermodynamic equilibrium 
is comprised of an essentially infinite number polymer conformations, to a 
frozen globule, in which equilibrium ensemble is dominated by one or very few 
conformations. We also examine some special cases of interaction matrices 
to analyze the relationship between the freezing transition and the nature of 
interactions involved. 

PACS 61.41. +h., 64.60. Cn, 87.15.Da, 64.60.Kw 
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Typeset using REVT^ 



I. INTRODUCTION 



The relationship between the sequence and conformation of a heteropolymer is one of 
the most challenging unsolved problems in biophysics. In the case of proteins, it is widely 
believed that the native functional conformation is, in a sense, "written" in the sequence 
of the heteropolymer in the "language" of the interactions between monomer species. This 
conformation is also believed to be both the ground state from thermodynamic point of 
view (better to say, it is structurally very close to the ground state, up to some short scale 
thermal and/or frozen fluctuations) and reliably accessible from the kinetic point of view. 

The fact that even chains with random sequences can have a unique frozen ground state 
was first discussed in terms of phenomenological models where the freezing transition 
was shown to be similar to that of the Random Energy Model (REM) [0]. The REM-like 
freezing transition was also derived starting from a microscopic Hamiltonian in which the 
interactions between pairs of monomers were assumed to be random, independently taken 
from a Gaussian distribution 0. In this model, the nature of interactions between species 
was parameterized in terms of the mean and width of the monomer-monomer interaction 
distribution. Thus, in this sense, polymer sequence was not explicitly included in this model, 
since it is absent from the Hamiltonian. As for models with polymer sequences explicitly 
present, two have been considered so far: 2 letter Ising-type model |^ and the so-called 
p-charge model 1^,^. These models were shown to also exhibit a freezing phase transition 
for random chains. 

Therefore, it is natural to conjecture that any sort of random heteropolymer will have this 
kind of transition, and the question is whether we are able to understand the properties and 
characteristic temperature of this transition for realistic models of heteropolymers. Indeed, 
proteins, for example, are comprised of 20 kinds of different monomers, which interact to 
each other in a complicated manner. There are several relevant types of interactions between 
different monomers, such as van-der-Waals interactions, dipole-dipole interactions, hydrogen 
bonds, and hydrophobic interactions. 
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As long as we are speaking about short-range interactions, interactions can be described 
in terms of a matrix: if there are q types of monomers, we have a g x g matrix, where each 
matrix element represents the energy of interaction between monomers of the types i 
and j, given that they are in spatial contact. There were several attempts in the literature 
to derive this kind of "interaction" matrix for real amino acids (see, in particular, [Q). It 
is rather difficult, however, to derive this kind of matrix. Furthermore, the sensitivity of 
heteropolymer properties to deviations of the interaction matrix is unclear. For computer 
simulations, for example, it is important to know how precise one should be in choosing the 
interaction energies in order to reproduce the native state and to avoid the appearance of 
some other state, structurally completely different, which may appear as the ground state 
of a simulated system due to an imperfect interaction matrix. Of course, other non-protein 
heteropolymers might be also of interest. 

In this paper, we consider the freezing transition for a heteropolymer with an arbitrary 
interaction matrix. We derive a general formalism for the analysis of the freezing transition 
of random chains in which only short range interactions are assumed. In addition to the 
formal benefit that the general treatment establishes a formalism with which other short 
range species interaction models can be derived as special cases by using specific interaction 
matrices, this theory can be used to analyze what properties of a species-species interactions 
matrix effect the freezing transition and in what way. 

II. DEVELOPMENT OF THE FORMALISM 

A. The Model and its Hamiltonian 

Consider a heteropolymer chain with a frozen sequence of monomers s/, where / is 
the number of monomer along the chain (1 < / < A^) and S/ is the sort of monomer 
I in the given sequence. Let q be the total number of different monomer species, 1 < 
s{I) < q- In the condensed globular state, the spatial structure of the chain is governed by 
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volume interactions between monomers. The disorder and heteropolymer effects of different 
monomer species comes mainly through pairwise monomer-to-monomer interactions. On the 
other hand, higher order interactions provide the non-specific excluded volume effect, while 
chain connectivity defines the set of available placements of monomers in space. This is clear 
when one considers the lattice model, where subsequent monomers are nearest neighbors on 
the lattice (chain connectivity): a site on the lattice can be occupied by only one monomer 
(excluded volume effect), and the energy is given as a sum of pairwise interactions of the 
nearest neighbors on the lattice. The complicated set of monomer-monomer interactions, 
related to frozen-in sequence, appears then due to the restricted set of pairings of monomers 
in the space. The interaction part of Hamiltonian can be therefore written in a rather simple 
way: 

q N 

^ = E E BiAri - rj)6isj, t)6isj, j) + W (1) 

where Bij6{rj — rj) gives the Mayer function of short range interaction between monomers 
of species i and j, placed in space at the distance r/ — rj apart from each other, s/ is the 
species of monomer number / ( "spin" of monomer J), and 5 is either Kroneker or Dirac delta. 
Eq (|I|) has the simple interpretation that monomers number / and J interact based upon 
their proximity, 5(r/ — rj), and the second virial coefficient of interaction between the species 
of the two monomers, B^j^j. The Ti' contribution contains all higher order interactions of 
monomers. We assume that it is "homopolymeric" in form, i.e. it does not depend on the 
monomer species, but only on the overall density p. It can be written as Ti' = Cp^+Dp^ + . . ., 
where all virial coefficients C,D, . . . are assumed to be positive (repulsive). 

Throughout the paper, we will use the following notation: upper case Roman characters 
label monomer numbers, i.e. bead number along the chain (1 < / < N), lower case Roman 
characters label monomer species numbers {1 < i < q), and lower case Greek characters are 
for replica indices {1 < a < n), which will be defined later. We will be also using the notation 
for vectors and operators (matrices) with the clear indication of the dimensionality of the 
corresponding space, as we consider several different spaces simultaneously. For example, 
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the interaction matrix with matrix elements Bij will be denoted as B^'^\ In this notation, 
vector means the density distribution Pi(R) for all species (q) over 3D R-space (oo). 



B. Replicas 



The statistical mechanics of a heteropolymer chain is expressed through the partition 
function, which can be somewhat formally written as 

1 



Z(seq) = J2 

conformations 



-H (conf , seq) 



(2) 



where we have clearly indicated that our Hamiltonian depends on both conformation and 
sequence. The standard way to approach the partition function of a system with frozen 
disorder is to employ, first, the principle of self-averaging of free energy and, second, the 
replica trick: 

1 



F = (F(seq))seq = -r(lnZ(seq))seq = lim i^l^^^ll^ 

71 >U fl 



(3) 



where (. . .)seq means average over the set of all possible sequences. 

In this paper, we consider random sequences, meaning that the species 1,2, ... ,q appear 
independently along the chain with the probabilities Pi,P2, ■ ■ ■ ,Pg {{Pi} = P^'^^)^ so that the 
probability of realization of the given sequence (seq = si, S2, ■ ■ ■ , sj, . . . , sn) is written as 



N 



^seq = PsiPs2 ■■■Psi ■■■PsN = li Psi (4) 

1=1 

Collecting the above equations, we can write the key value of n-replica partition function 



as 



(Z-(seq)), 



seq 



E 



exp 



(conf, seq) 



E ^seq E exp 



seq 



1 " 
^ a=l 



(5) 



where Ca — C\, . . . ,Cn stand for conformations of replica number a. 

For each conformation and each replica, we introduce density distributions of all species 

as 



N 

mt{R) = Y.S{si,t)5iv'}-R); {mf (R)} ^ m^^"-). (6) 
1=1 

For simplicity, we will not explicitly include the sequence independent terms Ti' from the 
original Hamiltonian (|I]). We then write in terms of the densities 

(Z"(seq)),,^ = E ^seq E exp I E E / dRidR, (Ri)%5(Ri - R2)m^"(R2) I 

seq Ci,...,Cn [ a=li,j=l-^ ) 

= E^scq E expj-;^ (m|5|m)^''"°°H , (7) 

where (|. . means scalar product in which all vectors and operators are supposed to 

have dimensionality as indicated ( g x n x cxd in this case). Operator is with 

respect to monomer species, and it is diagonal in both replica space and real coordinate 

space, meaning that it has matrix elements Bij6af3S(Ili — R2). The next step is to perform 

Hubbard-Stratonovich transformation of the form 

T 

d.-.^Cn" ^ seq 



(Z"(seq)),eq = Ar E [^{'P} exp|^(0|5-i|0)^'"°°HxE^seqexp| 



(gnoo) 'I 

m \ 



(8) 

Here {0"(R)} = ^(^"oo) ^^e the fields conjugated to the corresponding densities and M is 
normalization factor which comes from integration over 0. 

Note that the sum over sequences enters only in the last "source" term of this expression: 

exp {source term} = E -Pseqexp | m^'^ H . (9) 

seq ^ ^ 

The summation, or average, over the sequences is easier to describe in non-vector notation: 

q N q ( n \ 

exp {source term} = E II Psj H exp < i) E i ^R 4>^{R)5{jc^ — R)} 

Si,S2,...,sjv 7=1 i=l I a=l ) 

N q q ( n „ ~| 

= n E n exp 5{si, i)Y. dR 0?(R)5(r? - R) 

I=lsi=l i=l I 0=1"^ J 

= n EP^exp dR 0r(R)'5(r? - R) (10) 

As in case of two-letter heteropolymer, to extract the relevant order parameters, we 
expand over the powers of the fields (high temperature expansion) and keep terms up to 



q n 

source term = J2J2 rfRp"(R)piC(R) 

i=l a=l 

q n „ „ 

+ 2 E lP^^V ~P^Pj\ E y^I^l C(Rl)Qa/3(Rl,R2)0f(R2) , (H) 



where we use standard definitions 11E,10 



N k 

g„„...,,,(Ri, . . . , Rfc) = 5] n ^(rr - R-k) , (12) 

1=1 K=l 

AT JV 

Q„(R)=p"(R) = ^5(r?-R) ; Q,^(Ri, R2) = J] 5(r? - Ri)5(r? - R2) . (13) 
/=i 1=1 

Note that the total density of the polymer chain p"(R) in equilibrium does not depend on 

replica number and, within a large globule, does not depend on R. Replicas are interpreted 

as pure states of the polymer chain and the fc-replica order parameter Qai,...,afc is 

interpreted as the overlap between replicas ai, . . . , a^. 

The n-replica partition function is now written in the form: 



Cl,...,Cn 

X exp '^BT-}6{R, - R2)Sap + ^Qa(3{'Ri,^2)\j 0) + (p| (py^^^^'^j (14) 
where 

TV 

A,, =M,-p,p, and p-<''"°°)=pr(R)=PiE^(r?-R)- (15) 

7=1 

We are left with a Gaussian integral ( p!4D for the n-replica partition function, which is 
simplified by the argument given in [p|,^p!0|, showing that the R-dependence of Qa/s is of 
5-type, so that 

g„;3(Ri, R2) = PQc^pSiRi - R2) , (16) 

where diagonal matrix elements of new matrix g^"^ are 1, while off-diagonal elements are 
either or 1. This means physically that two replicas a and (3 might be either uncorrelated 
(independent), so that Q^p = 0, or they may be correlated so that one repeats the 3D fold 



of the other down to the microscopic length scale, so that Qq,/j(Ri, R2) = p5(Ri — R2)- We 
do not repeat this argument here, as it is explained elsewhere (see the argument presented 
in [|10| which is slightly different from the original one 0). 



C. Effective Energy in Replica Space 

With the simplified form of Q matrix, we evaluate the Gaussian integral over all 
variables. This yields 



(Z"(seq))seq = E exp[-iVE{g}] 

Cl,...,Cn 



(17) 



with the energy of the form 



E 



(<?) 



+ -ln det 
2 



-(fi^-^)) ^ ® + ipg(") ® A^"?) + -Indet (45('?"V^) • 



(18) 



.4 V y 2' ' J 2 

Here ® means the direct product, eg. for the block matrix iJ^^noo) _ Bij6ai36{Ili — R2) = 
^(9) (g)/(") (g)/(oo)_ jj-^ general, A^''''^ B^^'^ produces block matrix of the total size rs, according 
to the rule: instead of each matrix element of A^^^ matrix, say Auv, we substitute the block 
equal to AuvB^'^\ The last term in (|18]) comes from normalization factor J\f in eq (|I^); it is 
easy to check that the normalization factor created by Gaussian integration A/", simply elim- 
inates normalization factors first introduced by the Hubbard-Stratonovich transformation. 
Noting that det (A) ■ det(i?) = det{AI3), we can simphfy the last relationship as 



E 



In det 



+ P 



T 



(19) 



We have assumed that the Gaussian integral converges and can be calculated. This is 
guaranteed only by the appropriate form of g*^"-* matrix, i.e., by replica symmetry breaking. 
We make an ansatz that of the form of a Parisi matrix with one-step replica symmetry 

breaking We say that replicas can be gathered into n/x groups each of which consists 



of X replicas. The conformations of all of the replicas in a given group coincide to the 
microscopic scale, i.e. for a, (3 E group A and 7 G group B, then q^js = 1 and g^^ = qjs^ = 0. 
Thus g*'"''-* can be written as a block matrix (in replica space) which is partitioned into n/x 
blocks of size x x x along the diagonal. Inside each diagonal block, = 1, and outside 
Qa/B = 0. In fact, it was recently shown that this form can be derived by energy minimization 
in the two letter case [|lOl, and we can easily repeat this argument for the general g-letter 



case at hand. For the sake of simplicity, however, we omit the derivation, thus, formally 
employ the ansatz. 

We can substantially simplify both terms in the energy (|19|), and convert them into the 
form 



E=^ln det + ^Ab) +n(p 



p) (20) 



Here we have dropped the labels of the dimensionality of the vectors and operators, as all of 
them are of the same dimensionality (g). This is because we have diagonalized the energy 
in both R {00) and replica (n) spaces, so only the species dimension (g) remains. 

The proof of the simplification leading to ( pil| ) is given in the Appendix. We now turn 
to its analysis. 



D. Effective Entropy in Replica Space 

In order to get the free energy, we must also consider the entropy change due to the 
constraint on g^^. Following Refs. 0,0], we argue that due to the polymeric bonds connecting 
monomers along the chain, once one monomer is fixed in space, the next must be placed 
within a volume a^, where a is the distance between monomers along the chain. Since 
replicas that belong in the same group coincide within a tube of radius Rt ^ , where v is 
the excluded volume of a single monomer, there are a^/v ways to place the next monomer 
and thus the entropy per monomer in just ln(a^/i;). But since all replica conformations 
coincide within the group, we must restrict the position of the next monomer to a single 
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place. Following the Parisi ansatz for one-step RSB, for n replicas, there are n/x groups with 
X replicas per group. The entropy loss is therefore 



n 



S^Ns-(x-l) 



X 



(21) 



where s — ln(a^/^;) is related to the flexibility of the chain. 

E. Freezing Transition 

Recall that, for notational convenience, we drop the indication of dimensionality, as all 
operators and vectors are now assumed to be in species space, i.e. dimensionality q. We 
optimize the free energy 



F = ^Indet (j + ^A5^ + n /p 



with respect to x, yielding 



p) + s-(x - 1) (22) 

/ X 



2s = Indet ( / + ^AB^ - Tr 



+ {P 



T T \ T J 



p) ■ 



(23) 



As is clear from the very structure of this equation, its solution is of the form x = T^/2p, 
where ^ is given by 



2s = lndet(/ + ,^A5) -Tr ^AB(T+^AB) ^ + /p ^'^BAB(T + ^AB 



p 



(24) 



Recall that x is the number of replicas in one group, i.e., the number of replicas which 
have the same conformation down to microscopic fluctuations. This interpretation is clear 
when n is integer and n > 1. While taking the n — > limit, we have to consider x to be in 
between n and 1, so that x < 1 means the existence of grouping of replicas, or broken replica 
permutation symmetry, while x approaching 1 means the restoration of replica symmetry. 
Therefore, x — 1 defines the point of phase transition between the frozen globular phase 
with broken replica symmetry and the phase of a random "liquid-like" replica symmetric 
globule. The corresponding freezing temperature is given by Tf — 2p/^. 
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Thus, from the n-rephca free energy, we obtain the real free energy 



^Indet {f+^AB 
f Indet (l + ^AB 



V 



pBlI + ^AB 



P 



s{Tf-T) iorT<Tf 



P 



pB(T+ ^AB 



P 



for T>T 



(25) 



III. DISCUSSION 



A. What is A? 



We first examine the physical meaning of the operator A and the term AB. From the 
definition of A, we have 



(^AS) = J2 (Pi^ij - PiPj) Bjk = PiB^k - ^PiPjBij (26) 
i 3 

We can always write Bij in terms of a sum of a homopolymeric attraction {Bq) and het- 
eropolymeric deviations {hij = Bij — (B)). From (^), we see that A removes the mean 
interaction of species k from all matrix elements Bkj. In other words, A removes all ho- 
mopolymeric effects. 

It is instructive to examine what happens to the energy ( PO] ) when one formally takes 
AB = 0; in this case 

E = n (p/T) {p\B\p) = n {p/T) ^ = n (p/T) (B) , (27) 

which is simply the averaged second virial term. Note that as this term is not coupled 
to X, (B) does not enter into the calculations of the freezing temperature. We note that 
the terms nTi' = nCp/T + . . . from the original Hamiltonian (|l|) are not explicitly written, 
but must be considered when optimizing the free energy. Thus, for | (B) \ ^ \bij\, we can 
optimize the free energy with respect to x and p independently. However, if this condition 
is not valid, the coupling between density and the replica overlap order parameter becomes 
significant; this should lead to other interesting physical phenomena, which are beyond the 
scope of this paper. The "homopolymeric" attractive second virial term, in competition with 
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the repulsive higher order terms in 7i', is responsible for the formation and maintenance of 
the globular conformation with a reasonably high density. Therefore, {B) primarily enters 
into homopolymer effects, such as the coil to globule transition. Other effects, such as the 
freezing transition, are purely heteropolymeric, and are due to hij, or Ai? terms; they are 
related to the choice of some energetically preferential conformations out of the total vast 
number of globular conformations. 

For the homopolymer case (g = 1 or Bij = Bq) or the effective homopolymer case (a 
heteropolymeric interaction matrix is rendered homopolymeric due to the choice of compo- 
sition p; say, pi = 1, while others pi = 0), we immediately see that AB = 0, so T/- = and 
thus there is no freezing transition. (This is of course just trivial check of consistency of our 
equations) . 



B. Two Exactly Solvable Models 

There are some models which can be solved exactly from eq (^). We will see that the 
exact solution of simple models yields insight which will be important in the more general 
consideration of the next section. 



1. Potts Model 

Potts interactions are defined by the interaction matrix Bij = b6ij + Bq. The freezing 
temperature can be found exactly for this model for the case of even composition, i.e. 
Pi = 1/q. From (PDj), we see that the relevant matrix to address is / + 2pxAB /T. As the 
diagonal elements of this matrix are 1 + 2bpx{q — 1)/Tq^ and all the off diagonal elements 
are equal —2hpx/Tq^, we find a (g — l)-fold degenerate eigenvalue 1 + 2bpx/Tq and a non- 
degenerate eigenvalue of 1 (see the Appendix for details). This leads to the energy term of 
the form 

Indet (j+ ^Ae) = {q- l)ln (l + ^) (28) 
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Note that this term vanishes for the homopolymer (g = 1) case. As for the other term of 
the energy (|20|), it reduces to 



P 



p = \p 



T 



p) , (29) 



i.e., to the average second virial term (0). This term does not contribute to optimization 
with respect to x. We find the freezing temperature 

Tf = ^ ~fP (30) 
^ qE{2s/[q-l]) ^ ^ 

where S(cr) is given self-consistently by 

r SV2 for S < 1 

S(a) : a = ln(l-S)+S/(l-S)^ . (31) 

[l/(l-S) forS^l 

We see that the freezing temperature decreases with increasing q. Physically, this corre- 
sponds to the fact that in the Potts model, all monomers from differing species interact 
with each other in the same way, so that the part of the chain without similar monomers is 
effectively homopolymeric. As q increases, these homopolymer-like regions increase and the 
freezing temperature consequently decreases. When b is negative (positive), we have physi- 
cal solutions of Tf for positive (negative) S. We see from eq. (|3l|) that the nature of the S 
function is different positive and negative values: there is a singularity at H = 1, whereas 
S < is well behaved. Thus, there is a fundamental difference between ferromagnetic-like 
(6 < 0) and antiferromagnetic-like {b > 0) interactions in terms of the freezing behavior. 

Two simplified asymptotic expressions for Tf can be mentioned, coming from the two 
asymptotics of the S((t) function (pT|): 

{—{pb/ \fs){\/q — 1/^) for effectively flexible chains, 2s/(g — 1) ^ 1 
• (32) 
-{2pb/q)[l + (g - l)/2s] for effectively stiff chains, 2s/(g - 1) > 1 

Recall that the parameter s = ln(a^/ v) is related to chain flexibility [|^, where a and v are the 

chain spacer size and monomer excluded volume, respectively; s is small for flexible chains, 

and large for stiff ones. Note, that the regions of applicability of the two asymptotics in ( pij ) 

are controlled by what can be called the effective flexibility a = s/{q — 1). Physically, this 
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corresponds again to the specific nature of Potts interactions. Indeed, tlie main difference 
between flexible and stiff chains is the number of neighbors along the chain in the interaction 
sphere in space around a given monomer. This number is large for flexible chains and small 
for stiff chains. As for Potts interactions, what is relevant is how many neighbors along the 
chain attract a given monomer. This number is obviously reduced by factor q — 1, and this 
explains the appearance of the effective flexibility s/{q — 1). 



2. p-charge 



In the p-charge model each monomer has a set of p generalized charges, which can 
be = ±1. The Hamiltonian is defined to be 

N p 



(33) 



I,J k=l 

In the interaction matrix, we define each possible combination of charg different 
species. Thus, there are q = 2^ species in the interaction matrix. For species number i 
(1 < z < g), the value of charge k is given by s'^{i) = 2 mod 2^—1, where [. . .J 

means truncate to the lowest integer. Thus, we have an interaction matrix of the form 



2^ 



mod 2-1 



J_ 

2k 



mod 2-1 



(34) 



The AI3 matrix has p non-zero eigenvalues Xi^X2, ■ ■ ■ ,Xp ^ (2^ — p)-degenerate 
eigenvalue of 0. Thus, 

lndet(/+?|^AB) = |:in(l + Bf^) 

and as in the Potts case, 



(35) 



p 



P) = [P 



p_ 

T 



B 



P 



(36) 



Thus, the freezing temperature is determined by 



2s =j: 



i=l 



In 1 + 



Tf J l + 2x^p/Tf\ 



(37) 
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For the specific case Xi = Xi we have 

where S(cr) function is defined above by (0). As in Potts interactions, the asymmetry of 
the S function yields different behavior, depending on the sign of x- Unlike the Potts case, 
the behavior of the p-charge model becomes more heteropolymeric, i.e. Tf increases, with 
the addition of more species. 

The two asymptotics, for flexible and stiff chains, in p-charge model are 

{— pY(2n/s)^^^ for effectively flexible chain, s/p <^ 1 
' . (39) 

— 2px(l +p/2s) for effectively stiff chain, s/p ^ 1 

Note, that effective flexibility is given by a = s/p for the p-charge model, i.e. it is again 
reduced by the number of species. 

We note that our result (|38|) reproduces automatically what is trivially expected for the 
homopolymer case {Tf = 0, i.e., no freezing, when p = 0) and also at p = 1 agrees with our 
previous result (|30|) at g = 2 in the case of two letter Ising heteropolymer. On the other 
hand, our equation (pH]), or its asymptotics in the flrst line of eq (pQ]), agrees with earlier 
results of the work in the opposite extreme of p ^ 1, i.e., in the region of applicability 
of that work. 



C. Reduction Theorems 

There are several cases in which the same physical system can be depicted in terms 
of formally different interaction matrices B and/or composition vectors p. Clearly, the 
expression for the freezing temperature, as well as for any other real physical quantity, must 
not depend on any arbitrary choice. 

For example, there might be some monomer species which are formally included in the 
list, and in the interaction matrix, but they are not physically presented in the chain, as the 
corresponding p vanishes, say, Pg = 0. It is easy to check, that in this case eq is reduced 
to smaller list of g — 1 monomer species with (g — 1) x [q — 1) interaction matrix. 
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Another example is when there are duphcate species, say, species labeled q and q — 1 
are physically identical, i.e. they interact in identical ways to all other species. Physically, 
we would expect that this problem is identical to the g — 1 species case, except with the 



new composition p'^_i = pg^i +Pg. Even though we skip the proof, eq (p^ ) indeed gives this 
expected reduction. 

These two statements, which we call "reduction theorems" , are not only a good check of 
consistency of our result (p4|), but they will be also important in further discussion. 



D. Freezing Temperature: General Consideration 

We return to the general analysis of the equation (^) for the freezing temperature, and 
we will show how to implement in the general case both the limits of stiff and flexible chains, 
similar to how those cases appear in the exact solutions for the Potts and p-charge models. 

We first perform an expansion in powers of ^ = 2px/T. For example, 

B{I + ^ABf^ = B- ^BAB + ^"^BABAB + ... (40) 

Note, that any term b(^AB^ , where k is a positive integer, is independent of {B), and 
therefore is purely heteropolymeric. The matrix product \AI3) [AIB) . . . ( AIS) 
can be interpreted as the propagation of heteropolymeric interactions from monomer species 
ii to 12, from 12 to i^, etc., up to ik- As we suppose from the very beginning that all of 
the heterogeneity comes from the second virial coefficient only, so that all higher order 
virial terms of the original Hamiltonian are in a sense homopolymeric, all heteropolymeric 
interactions are simply pair collisions of monomers. Each monomer takes part, of course, in a 
variety of pair collisions during a very long time, i.e., in thermodynamic equilibrium. Those 
collisions are weighted with the corresponding energies, and they form chains of collisions, 
described by b(^AB^ terms. Depending on both the Bij interaction matrix and the species 
occurrence probabilities pi, some of those chains might be more or less favorable than others, 
and this determines freezing transition in the system. 
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To employ the expansion (^Oj), we first rewrite (^41) by noting that In det A = TilnA and 
A 



p A pj = TiPA, where Pij = PiPj: 

2s = Tr|ln(/ + ^A5) -^A5[/ + ^A5]"^ +^2P5A5[/ + ^A5]"^| . (41) 
Now we are in a position to perform the expansion over the powers of ^, yielding 

oo 

2s=j:e {b')^ , (42) 

where 



m 
k=2 



B^) =— — Tr 



k -l_ \ r - .^\^/ '^^\ (fc-i) 



-A - kP) B (^-AB) 



(43) 



The values (B'') can be considered as moments of 13 matrix produced by a given A matrix. 
In fact, we can make the substitution 6^- = Bij — J2kiPkPiBki, i.e. remove the "homopolymer" 
mean from the interaction matrix, and the moments can be rewritten exactly with the 
exchange Bij —>■ bij. A consequence of this symmetry is that these moments vanish in the 
homopolymer case {bij = 0). 



We now pass to analysis of two opposite extremes in the equation (^l]) . 



1. Freezing Temperature: Stiff Chain Limit 
As we are instructed by the examples of Potts and p-charge models, what is important 



in high s limit is the singularity of the right hand side of (WmI- This is obviously governed 



by high k terms of power series, which are basically related to (^—AB^ . This is reminiscent 
of the standard problems of ID statistical physics, such as the ID Ising model, the ideal 
polymer, or other Markovian processes, where (^—AB^ plays the role of the transfer matrix. 
It is well known that highest eigenvalue of the transfer matrix is only relevant in /c cx3 
limit ("ground state dominance principle"). In this limit, ^ ~ 1/Amax, where Amax is highest 
eigenvalue of (^~AB^ matrix, and thus 

Tf ~ 2pA„,ax . (44) 
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To find the next terms in asymptotic formula for Tf, we note that the most divergent 
term in eq (^) comes form the last term in (^) and is due to kP term in (^31), it 
diverges as (1 — ^Amax)~^- We know, however, that this term vanishes for both Potts and 
p-charge models. Moreover, we can show, that it vanishes also for many other models 
with some regularities, producing cancellation of correlations and anti-correlations between 
matrix elements of B. For this reason, we keep next to the highest singularity, thus obtaining 

1 



2s 



+ 



p\ ip) (ip -B/\ 



P 



(45) 



^1 'CAmax)^ (1 C^max^ 

where A and if^ are the eigenvalue and the corresponding eigenvector of the (^—AB^ oper 
ator. This gives finally 

1 + VI + 4cs 



1 + 



2s 



2pAmax 1 + ^fc/s for cs > 1 (c 7^ 0) 



(46) 



2pAmax [1 + for cs < 1 (c = 0) 

Note that Xmax, as an eigenvalue, depends strongly on the arrangement of matrix el- 
ements. Therefore freezing transition for stiff chains is very dependent on the pattern of 
interactions, not only on their overall heterogeneity. This has clear physical meaning. In 
case of stiff chains, real monomers represent the physical units of interaction. In other 
words, quasi monomers almost coincide with monomers. In terms of propagation, or chains 
of collisions (see above), it is clear that highest eigenvalue of (^—AB^ matrix corresponds to 
the lowest (because of the sign) energy of interaction, while the corresponding eigenvector, 
in terms of the obvious quantum mechanical analogy, is the linear combination of monomers 
which realizes this lowest energy and thus controls the freezing temperature. 



2. Freezing Temperature: Flexible Chain Limit 

The examination of the small s case may be on the first glance questionable, as our 
approach is entirely mean-field in nature and, therefore, it might be applicable for large 
enough s only. We have seen, however, in the examples of Potts and p-charge models, that 
the applicability of the flexible chain limit is controlled by the effective flexibility, which is 
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considerably smaller than s itself. We therefore consider formally the small s limit, leaving 
the analysis of applicability for each particular case. 

In small s limit, only the first term with = 2 is relevant in the series (|4^). Omitting 
all higher order terms, we obtain the remarkably simple result 

2p /f52\V2 



Tf = f^{B^}^ (47) 



where the second cumulant (variance) is defined as (B"^ 
averages are defined by 



B ~(B 



and matrix 



b)=Y,P,PjB.,, . (48) 



Unlike the stiff chain limit, in the flexible chain case at hand, the freezing transition is 
controlled mainly by overall heterogeneity of interaction energies Bij. Thus, if one started 
with an interaction matrix with independent elements and shuffled the matrix elements (even 
though it is hard to think of real physical experiment of this kind), this transformation would 
not change the freezing temperature for flexible chains. This is qualitatively a very natural 
result, as the nature of flexible chains is such that for any given monomer, many of the 
neighbors in space are neighbors along the chain. In other words, the interaction units are 
quasi monomers, which are substantially different from the monomers and represent clouds 
of monomers, where the individuality of each monomer species (with different patterns of 
energetical preferences to other species) is lost. 

In the case of the Potts and p-charge models, the variance of the interaction matrix 
yields the flexible chain limits for both the Potts (|3^ ) and p-charge (^) models. Thus, the 
solution (|47|) for Ty in this limit is remarkably simple and powerful. To demonstrate this, 
we show some particular examples. 



E. Independent Interaction Model 



In the Independent Interaction Model, all Bjj are taken independently from Gaussian 
distribution 
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iJ) 



B_ 
2^ 



2\ 1/2 



exp 



[B 



IJ 



52 



(49) 



(recall that capital / and J are related to monomer numbers along the chain and not to 
species). From the physical point of view, the independence of, say, Bjj and Bjk can be 
realized if and only if the total number of different species is very large, i.e., in the q —>■ 00 
limit. The effective stiffness in this limit is small, and we have to use the expression (^Tj) 
for the freezing temperature. Therefore, Tf = IpBj This indeed coincides with original 
result of the work 1131 . 



F. Random Sequences of Real Amino Acids 

It is of special interest to examine the freezing transition for polymers comprised of real 
amino acids, i.e., of constituents of real proteins. This can be done using the matrix of 
interaction energies derived for amino acids by Miyazawa and Jernigan 0]. We are in a 
position to examine the freezing transition for random sequences (even though real protein 



sequences might not be random |T6| , p!7| ). In the work the interaction matrix was given 
in the form Ui^jTuj-, where Uij is the interaction energy and Tmj is a temperature not 
formally defined in f^. In some rough approximation, we identify the MJ matrix with our 
pB. To avoid rewriting of the eq (|2^) , we substitute the MJ matrix into (|2^ instead of B^ 
meaning that now ^ = 2TMj/Tf. We assume also equal composition pi = 1/q = 1/20. We 
can then numerically calculate the ^ vs s dependence. The result is shown in Figure 1. Note 
the qualitative similarity of the graph of ^ vs s for the MJ matrix and S vs s given by (pT]). 

Given the realistic value of s ~ 1.4 {v/a^ ~ 0.25) for polypeptide chain, we obtain from 
the Figure 1 the estimate C, ~ 1-6, or Tf ^ 1.25Tmj. By taking more realistic uneven 
composition, we arrive at ,^ ^ 1.75, or Tf ^ I.IATmj- Note that for real amino acids system 
the relevant solution is generally in the high flexibility regime. 

To understand these results, recall the way that the MJ matrix was derived in The 
protein 3D structures data bank was employed such that if there were J^tj contacts between 
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amino acids labelled as i and j in the data bank, and the total number of contacts was Ai, 
then the ratio Aiij/Ai was interpreted as a probability governed by some effective Boltzmann 
distribution M.ij/A4 = exp [—Uij/TMj], thus yielding the MJ matrix of energies, Uij. In the 
later work ||18|, it was shown that the ratio Aiij/Ai obeys indeed Boltzmann type formula if 
proteins do match the random energy model, and then the parameter of distribution, Tmj, 
is nothing but the freezing temperature, Tf [|l9l. From that logic, we expect thus Tf = Tmj- 
Our result is slightly higher. We conclude thus, that there is a reasonable agreement between 
the works [B, fTSl, and our results. 



IV. CONCLUSION 

Starting from a sequence-model Hamiltonian in which interactions between species of 
monomers is expressed in terms of some arbitrary symmetric matrix B, we have derived 
a formalism with which to examine the freezing transition of random heteropolymers. As 
monomer species interactions are given by some matrix, this formulation is the most general 
form, assuming that interactions are short range and that heteropolymeric contributions 
come primarily from two-body interactions. 

First, we have related the freezing temperature to the interaction matrix self-consistently. 
This self-consistent equation can be solved exactly for certain specific systems. For example, 
models such as the g-Potts and p-charge models are important as they describe interesting 
physical cases, but with only a minimal amount of complexity in their solutions. It is 
especially interesting that these two simple models have radically different freezing behavior 
with respect to the number of species. Clearly simply adding new and different monomer 
species does not necessarily enhance the freezing transition. 

Taking another approach, we can trade the accuracy of an exact result for the generality 
of the assumption of only some arbitrary symmetric interaction matrix. To this end, we 
solved the exact self-consistent equation perturbatively. Due to the nature of the S function, 
there are two regimes of interest: small s (high effective flexibility) where S ^ and s — > oo. 



21 



where S approaches a singularity in the self-consistent formulation. Expanding at these two 
limits, we found 

f {2plJs) (b^Y^^ for small s 
Tf-< \ 

[ pAinax for large s 

where Amax is the largest eigenvalue of the —AB matrix. 

The equation above quantitatively details certain descriptions of what one could qual- 
itatively call the "heteropolymeric character" of the interaction matrix B and the species 
composition p. Specifically, for flexible chains, one would expect that the physical unit of 
interactions, or quasi monomers, consist of several monomers. The variance of the interac- 
tion matrix gives, in a sense, the heteropolymeric width of interactions. If these interaction 
energies are ordered in the interaction matrix, however, the correlations between monomer 
species interactions reduces the heteropolymeric nature of the system, and thus reduces the 
freezing temperature. 

In the limit of stiff chains, quasi monomers generally consist of individual monomers. 
Thus, the specific nature of interactions are of paramount importance. In this limit, one can 
imagine the interactions in space (i.e. not necessarily along the chain) as interactions prop- 
agating through the pairwise interactions of monomer species. This chain of interactions, in 
the stiff polymer limit, becomes very long and thus the system shares characteristics with 
other one- dimensional systems, such as the ID Ising model. Specifically, here the freezing 
temperature is proportional to the largest eigenvalue, which dominates in the long interac- 
tion chain limit, of the transfer matrix —AB. 

To conclude, we comment on the applicability of this theory. Three points are to be 
mentioned here. First, since we truncate the series ([TT|) to (9(0^), we cannot comment 
on the nature of any phase transitions in the average value of 0, such as the microphase 
segregation seen in two letter ferromagnetic interaction matrices 0]. Second, all calculations 
have been mean field and therefore effects due to the fluctuations in the order parameter Qap 
have been neglected. At present, these effects have been examined in one-loop approximation 
only P(]| , PT1| . For further details, we refer the reader to the discussions in [|,|T3]- Even though 
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the issue remains open, to the best of our understanding at this moment, mean field (fP' 
theory is definitely applicable in the vicinity of freezing transition, at least when the number 
of monomer species is more than two. Third, and last, we have ignored the possibility of 
liquid crystalline ordering in the stiff chain regime. This issue has not been examined at all, 
we can only guess qualitatively that orientational ordering should be suppressed in strongly 
non-uniform heteropolymer system compared to a homopolymer of comparable stiffness. 

Thus, for models with "heteropolymeric character," i.e., the interaction matrix and prob- 
ability distribution cannot be reduced to that of a homopolymer, our theory predicts a freez- 
ing transition. Our formalism facilitates the calculation of specific models of interactions, 
but perhaps most importantly, the direct relationship between the interaction matrix and 
the freezing transition is demonstrated. 
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APPENDIX A: PROOF OF EQUATION (20) 

1. Consider first the auxiliary problem of some xxx matrix g*^^^ with diagonal elements q 
and off diagonal elements q. This matrix has a (x — l)-fold degenerate eigenvalue )^ = q — q, 
corresponding to the eigenvectors (1 —1 ... 0), (1 —1 ... 0), 
(1 ... —1 ... ),...,(! ... —1 ), and a non-degenerate eigen- 
value of A = g + (x — l)q, corresponding to the eigenvector (1 1 1 ... 1 ). Of course, 
there are other ways of choosing eigenvectors, in particular, we can built up orthonormal 
basis by choosing 



TZap = -X= exp 



X 



l<a,P<x. (Al) 



Here a numerates eigenvectors, while (3 numerates components of the given eigenvector (or 
vice versa). We can interpret TZ^^^ = TZap as the unitary operator transforming (f^^ to 
diagonal form, TZqJZ^^ = X^^^ = Aq^q/j, with the eigenvalues Aq, given above. Q We will be 
particularly interested in the case q = q = 1. In this case, the non-degenerate eigenvalue is 
A = 1, while all the others are zero. 

2. Consider now some general properties of the "direct product" operation for matrices. 
We repeat the definition: A^^^0l3'^'^^ is rs x rs, built up by substitution of s x s block A^vB^^^ 
instead of each matrix element of A^'^\ 

1. By matrix row and column operations, it is easy to show that the rule is commutative, 
i.e. 

® = ® . (A2) 

2. Block matrix multiplication rule: it is well known that the operation of block matrix 
multiplication is carried out in the same scheme as normal matrix multiplication. 



^For completeness, we write also the inverse of (f^^: it has diagonal elements {q — q) ^ — q{{q 
q)[q + {x — l)q]}~^ and off diagonal elements —q{{q — q)[q + {x — l)q]}~^. 
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except the multiplication of elements is replaced by the matrix multiplication of blocks. 
This can be written as 



(A3) 



3. Commutation of A^'^^ (g> B^'^^ and A''^''^ (S) 5''^'^) depends on commutation of both pairs 
A^^'^SzA'^^^ and I3^^^&cI3'^^'> (this directly follows from previous.) 

4. The determinant of a block diagonal matrix equals to the product of determinants of 
the diagonal blocks. In particular, 



det ® 7(^)) = (det A'-'-' 



(A4) 



5. The definition of direct product can be trivially generalized for non-square matrices 
and, in particular, for vectors f\ . For example. 



^nq)\ ^ ^n) (g, p<g) _ 



6. Matrix operation with a vector. 



(A5) 



7. Scalar product of vectors: 



® a'« ® = (aM| a'^) 



(A6) 



The proof of all the above mentioned properties is straightforward. 

3. Let us return now to the expression of energy ([T9|). We have to address the matrix 
j{qn) _|_ Mq{n) (g, /\iQ) j^ii) . We kuow (or we assume) that g^"^ is comprised of n/x (f^^ blocks 
along the diagonal, with q = q = 1, that is g^"-* = /("/^) 0q^^\ First, this form of g*-"-* matrix 
allows us to factor the matrix of our interest: 



2^(rxr') ^ jj{sxs') -g gg^erally the matrix rs x r's' 
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f{qn) _^ ^^n) ^ ^(q) 



j{n/x) 



f(gx) _^ ^^(x) ^ ^{g)^(g) 



(A7) 



This means physically that replicas of different groups are not coupled, they do not interact 
to each other. 

The remainder (in the square brackets in the right hand side of can be diagonalized 
via the rotation operator TZ^^'^^ = 7^(^) (g) Indeed, using properties 2 and 3 above, we 
have: 



(A8) 



Recall that there is only one non-zero A, and therefore the last matrix has one g x g block 
{2p/T)A^i^Mi^ in the upper-left corner, it has 1 down this block on the main diagonal, and 
all other matrix elements are 0. 

We are now in a position to simplify the first term of energy (p^Of ). First, we apply 
the rule 4 to this energy term, then we note that determinant does not change upon ro- 
tation (|A8|) , while the determinant of the right hand side of ( |A^ ) is trivially computed, 
yielding 



In det 



71 

— In det 

X 



/(<?) I ^ Ato) Big) 
T 



(A9) 



As for the second term in (|T9|), we first apply the rule 7 to get 



j{qn) ^ ^qin) ^iq) §{g)'' 



T 



n 



X 



P 



T 



(AlO) 



We then use the rotation ( [A^ ) and note that 13^''^ ® and T?.'-'^^-' do commute to each other 
due to the rule 3. This yields the form 

2P' 



X \ 



T 



-1 



(All) 



We consider, therefore, the rotation of density vector 



pi^i))_ First, we note that ,0^^''^ 



Second, the density, as the physical quantity, is the same for all replicas and 
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does not depend on replica indices. To write it formally, let us define two a;- dimensional 
vectors i''^^ = (1 1 1 ••• 1) and j^^^ = {1 ... 0). Then we see by direct 



implementation of formula (eq:orthonormbasis) TZ^^^ 



On the other hand, 



P 



pi^^\ Therefore, according to the rule 5, we have TZ^'^ 



1) 



py/xj^^'>(^ff'^\ This 



yields the energy term in the form 



2rt^ _ _ -.-1 



(A12) 



As j'^^^ has only one non- zero-component, and A*-^-* has also only one non-zero matrix element, 
corresponding to the same direction in vector a;-dimensional space, we have 



(A13) 



The last step is to implement the scalar product rule 7, yielding 



X \ 



T 



T 



(A14) 



Combining (|A9|) with ( |A14|) , we arrive at (pOl). 
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FIGURE CAPTIONS 

Figure 1: 

a) Plot of the inverse reduced freezing temperature (S) vs the effective flexibihty (cr), with 
the inset of the graph showing the detail of the small s vs small S regime. The important 
characteristics of this function is that it is described by for small S and the existence 
of a singularity at S = 1. The solid line denotes the exact solution, the unevenly dashed 
line denotes the stiff chain expansion, and the evenly dashed line denotes the flexible chain 
expansion. 

b) For the Miyazawa and Jernigan matrix of amino acid interactions, we plot the flexibility 
(s) vs the reduced inverse freezing temperature (^), with the inset of the graph showing the 
detail of the small s vs small ^ regime. Qualitatively, this curve is similar to S(cr). Further 
note, however, that any physical polymer will be described by the small ^ regime. The 
dashing of curves denotes the same approximations as in part (a). 
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